Model Selection

Large-Scale Pretraining

# Large-Scale Pretraining

Bart Large Teaser De V2

Large German text processing model based on the BART architecture, suitable for various natural language processing tasks

Large Language Model

Bart Large Paraphrase Generator En De V2

Large-scale English-German paraphrase generation model based on BART architecture

Machine Translation

AMD's fully open 3-billion-parameter language model family trained on Instinct MI300X GPUs, outperforming open models of similar scale

Large Language Model

Vit So400m Patch16 Siglip 512.v2 Webli

A vision Transformer model based on SigLIP 2, designed for image feature extraction and suitable for multilingual vision-language tasks.

LongVA-7B-TPO is a video-text model derived from LongVA-7B through temporal preference optimization, excelling in long video understanding tasks.

Videollama2.1 7B 16F Base

VideoLLaMA2.1 is an upgraded version of VideoLLaMA2, focusing on enhancing spatiotemporal modeling and audio understanding capabilities in large video-language models.

Transformers English

Depth Anything V2 Base

Depth Anything V2 is currently the most powerful monocular depth estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unannotated images.

3D Vision English

4M is an 'any-to-any' foundational model training framework extended to multiple modalities through tokenization and masking techniques

Multimodal Fusion

Chronos T5 Large

Chronos is a family of pretrained time series forecasting models based on language model architecture, which supports probabilistic forecasting by converting time series into token sequences for training.

Vitamin XL 256px

ViTamin-XL-256px is a vision-language model based on the ViTamin architecture, designed for efficient visual feature extraction and multimodal tasks, supporting high-resolution image processing.

Vitamin XL 384px

ViTamin-XL-384px is a large-scale vision-language model based on the ViTamin architecture, specifically designed for vision-language tasks, supporting high-resolution image processing and multimodal feature extraction.

Pile-T5 Base is an encoder-decoder model trained on The Pile dataset using the T5x library, trained for 2 million steps with MLM objective, approximately 2 trillion tokens.

Large Language Model

Transformers English

Stt Fr Fastconformer Hybrid Large Pc

This is a French automatic speech recognition model based on the FastConformer architecture, combining Transducer and CTC decoders for high accuracy and multi-domain adaptability.

Speech Recognition French

Chinese Clip Vit Base Patch16

The base version of Chinese CLIP, using ViT-B/16 as the image encoder and RoBERTa-wwm-base as the text encoder, trained on a large-scale dataset of approximately 200 million Chinese image-text pairs.

BERTovski is a large pre-trained language model based on Bulgarian and Macedonian texts, utilizing the RoBERTa architecture, and is a product of the MaCoCu project.

Large Language Model Other

Wav2vec2 Large Tedlium

Wav2Vec2 large speech recognition model fine-tuned on the TEDLIUM corpus, supporting English speech-to-text conversion

Speech Recognition English

Vision Perceiver Fourier

Perceiver IO is a general-purpose Transformer architecture capable of processing multiple modalities. This model is specifically designed for image classification tasks and pretrained on the ImageNet dataset.

Image Classification

DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, improving upon BERT and RoBERTa models, and excels in natural language understanding tasks.

Large Language Model

Transformers English

Albert Xxlarge V1

ALBERT XXLarge v1 is a Transformer model pretrained on English corpus using Masked Language Modeling (MLM) objective with parameter-sharing features.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase